AITopics | cosine angle

Collaborating Authors

cosine angle

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Gradient Regularized Newton Boosting Trees with Global Convergence

Zozoulenko, Nikita, Falkowski, Daniel, Cass, Thomas, Gonon, Lukas

arXiv.org Machine LearningMay-4-2026

Gradient Boosting Decision Trees (GBDTs) dominate tabular machine learning, with modern implementations like XGBoost, LightGBM, and CatBoost being based on Newton boosting: a second-order descent step in the space of decision trees. Despite its empirical success, the global convergence of Newton boosting is poorly understood compared to first-order boosting. In this paper, we introduce Restricted Newton Descent, which studies convex optimization with Newton's method on Hilbert spaces with inexact iterates, based on the concepts of cosine angle and weak gradient edge. Within this framework, we recover Newton boosting with GBDTs and classical finite-dimensional theory as special cases. We first prove that vanilla Newton boosting achieves a linear rate of convergence for smooth, strongly convex losses that satisfy a Hessian-dominance condition. To handle general convex losses with Lipschitz Hessians, we extend a recent gradient regularized Newton scheme to the restricted weak learner setting. This scheme minimally modifies the classical algorithm by introducing an adaptive $\ell_2$-regularization term proportional to the square root of the gradient norm at each iteration. We establish a $\mathcal{O}(\frac{1}{k^2})$ rate for this scheme, thereby obtaining a globally convergent second-order GBDT algorithm with a rate matching that of first-order boosting with Nesterov momentum. In numerical experiments, we show that our scheme converges while vanilla Newton boosting may diverge.

artificial intelligence, hessian, machine learning, (16 more...)

arXiv.org Machine Learning

2605.00581

Country:

Europe (0.46)
North America > United States (0.28)

Genre: Research Report (0.63)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.87)

Add feedback

Hyperplane Arrangements of Trained ConvNets Are Biased

Gamba, Matteo, Carlsson, Stefan, Azizpour, Hossein, Björkman, Mårten

arXiv.org Artificial IntelligenceApr-14-2023

In recent years, understanding and interpreting the inner workings of deep networks has drawn considerable attention from the community [7, 15, 16, 13]. One long-standing question is the problem of identifying the inductive bias of state-of-the-art networks and the form of implicit regularization that is performed by the optimizer [22, 31, 2] and possibly by natural data itself [3]. While earlier studies focused on the theoretical expressivity of deep networks and the advantage of deeper representations [20, 25, 26], a recent trend in the literature is the study of the effective capacity of trained networks [31, 32, 9, 10]. In fact, while state-of-the-art deep networks are largely overparametrized, it is hypothesized that the full theoretical capacity of a model might not be realized in practice, due to some form of self-regulation at play during learning. Some recent works have, thus, tried to find statistical bias consistently present in trained state-of-the-art models that is interpretable and correlates well with generalization [14, 24]. In this work, we take a geometrical perspective and look for statistical bias in the weights of trained convolutional networks, in terms of hyperplane arrangements induced by convolutional layers with ReLU activations.

artificial intelligence, deep learning, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2003.07797

Country:

Europe > Switzerland > Zürich > Zürich (0.14)
North America > United States > California > Alameda County > Berkeley (0.04)
Europe > Sweden > Stockholm > Stockholm (0.04)

Genre:

Research Report > New Finding (0.46)
Research Report > Promising Solution (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

Word2Vec

#artificialintelligenceAug-23-2022, 02:29:42 GMT

Word2Vec is a Two Layer Neural Network based Continuous bag of word (CBOW) and Skip-gram architecture that captures the semantic information. It generates the word embedding (mapping of words in a vector space) for a given text corpus. It converts the words into vectors and vectors performs an operation like add, subtract, calculating distance, etc. which preserves the relationship among the words. How are the relationships among words are formed? Word2Vec assigns similar vector representation to the similar words.

context word, vector, word2vec, (14 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Understanding Softmax Confidence and Uncertainty

Pearce, Tim, Brintrup, Alexandra, Zhu, Jun

arXiv.org Machine LearningJun-9-2021

It is often remarked that neural networks fail to increase their uncertainty when predicting on data far from the training distribution. Yet naively using softmax confidence as a proxy for uncertainty achieves modest success in tasks exclusively testing for this, e.g., out-of-distribution (OOD) detection. This paper investigates this contradiction, identifying two implicit biases that do encourage softmax confidence to correlate with epistemic uncertainty: 1) Approximately optimal decision boundary structure, and 2) Filtering effects of deep networks. It describes why low-dimensional intuitions about softmax confidence are misleading. Diagnostic experiments quantify reasons softmax confidence can fail, finding that extrapolations are less to blame than overlap between training and OOD data in final-layer representations. Pre-trained/fine-tuned networks reduce this overlap.

ood region, softmax confidence, valid ood region, (14 more...)

arXiv.org Machine Learning

2106.04972

Country: Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.45)

Add feedback

Perceptron -- Deep Learning Basics – Hacker Noon

#artificialintelligenceFeb-24-2019, 04:40:14 GMT

Perceptron is a fundamental unit of the neural network which takes weighted inputs, process it and capable of performing binary classifications. In this post, we will discuss the working of the Perceptron Model. This is a follow-up blog post to my previous post on McCulloch-Pitts Neuron. In 1958 Frank Rosenblatt proposed the perceptron, a more generalized computational model than the McCulloch-Pitts Neuron. The important feature in the Rosenblatt proposed perceptron was the introduction of weights for the inputs.

cosine angle, perceptron, rosenblatt, (13 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (1.00)

Add feedback

Information retrieval document search using vector space model in R

@machinelearnbotJan-21-2018, 02:33:09 GMT

Note, there are many variations in the way we calculate the term-frequency(tf) and inverse document frequency (idf), in this post we have seen one variation. Below images show as the other recommended variations of tf and idf, taken from wiki. Mathematically, closeness between two vectors is calculated by calculating the cosine angle between two vectors. In similar lines, we can calculate cosine angle between each document vector and the query vector to find its closeness. To find relevant document to the query term, we may calculate the similarity score between each document vector and the query term vector by applying cosine similarity .

artificial intelligence, machine learning, natural language, (15 more...)

@machinelearnbot

Country:

North America > United States > Illinois (0.19)
North America > United States > Hawaii (0.17)

Industry:

Law (1.00)
Government > Regional Government > North America Government > United States Government (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Supervised Learning > Representation Of Examples (0.44)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.40)

Add feedback

Estimating the coefficients of a mixture of two linear regressions by expectation maximization

Klusowski, Jason M., Yang, Dana, Brinda, W. D.

arXiv.org Machine LearningSep-12-2017

The Expectation-Maximization (EM) algorithm is a widely used technique for parameter estimation. It is an iterative procedure that monotonically increases the likelihood. When the likelihood is not concave, it is well known that EM can converge to a non-global optimum. However, recent work has sidestepped the question of whether EM reaches the likelihood maximizer, instead by directly working out statistical guarantees on its loss. These 1 explorations have identified regions of initialization for which the EM estimate approaches the true parameter in probability, assuming the model is well-specified. This line of research was spurred by [1] which established general conditions for which a ball centered at the true parameter would be a basin of attraction for the population version of the EM operator. For a large enough sample size, the difference (in that ball) between the sample EM operator and the population EM operator can be bounded such that the EM estimate approaches the true parameter with high probability. That bound is the sum of two terms with distinct interpretations.

artificial intelligence, machine learning, probability, (17 more...)

arXiv.org Machine Learning

1704.08231

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.42)

Add feedback